AITopics

Country:

Europe (0.67)
North America > United States > New York (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.57)

Neural Information Processing SystemsFeb-16-2026, 09:56:10 GMT

Author Contributions

A.1 Deriving the Optimum of the KL-Constrained Reward Maximization Objective In this appendix, we will derive Eq. 4. Analogously to Eq. 3, we optimize the following objective: max

artificial intelligence, deep learning, machine learning, (17 more...)

Country:

Asia > Japan (0.05)
North America > United States > Virginia (0.04)
North America > United States > Hawaii (0.04)
(5 more...)

Genre: Research Report (0.46)

Industry:

Government (1.00)
Education (0.93)
Law (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Catalina, Daniel Sánchez, Cantwell, George T.

Model inference for ranking from pairwise comparisons

arXiv.org Machine LearningDec-18-2025

We consider the problem of ranking objects from noisy pairwise comparisons, for example, ranking tennis players from the outcomes of matches. We follow a standard approach to this problem and assume that each object has an unobserved strength and that the outcome of each comparison depends probabilistically on the strengths of the comparands. However, we do not assume to know a priori how skills affect outcomes. Instead, we present an efficient algorithm for simultaneously inferring both the unobserved strengths and the function that maps strengths to probabilities. Despite this problem being under-constrained, we present experimental evidence that the conclusions of our Bayesian approach are robust to different model specifications. We include several case studies to exemplify the method on real-world data sets.

algorithm, bradley-terry model, kernel, (17 more...)

arXiv.org Machine Learning

2512.15269

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.40)

Industry: Leisure & Entertainment > Sports > Tennis (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Chouliaras, Andreas, Chatzopoulos, Dimitris

Maximizing the efficiency of human feedback in AI alignment: a comparative analysis

arXiv.org Artificial IntelligenceDec-2-2025

Reinforcement Learning from Human Feedback (RLHF) relies on preference modeling to align machine learning systems with human values, yet the popular approach of random pair sampling with Bradley-Terry modeling is statistically limited and inefficient under constrained annotation budgets. In this work, we explore alternative sampling and evaluation strategies for preference inference in RLHF, drawing inspiration from areas such as game theory, statistics, and social choice theory. Our best-performing method, Swiss InfoGain, employs a Swiss tournament system with a proxy mutual-information-gain pairing rule, which significantly outperforms all other methods in constrained annotation budgets while also being more sample-efficient. Even in high-resource settings, we can identify superior alternatives to the Bradley-Terry baseline. Our experiments demonstrate that adaptive, resource-aware strategies reduce redundancy, enhance robustness, and yield statistically significant improvements in preference learning, highlighting the importance of balancing alignment quality with human workload in RLHF pipelines.

artificial intelligence, human feedback, machine learning, (16 more...)

2511.12796

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Leisure & Entertainment > Games > Chess (0.31)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Neural Information Processing SystemsNov-21-2025, 03:53:06 GMT

Learning from Group Comparisons: Exploiting Higher Order Interactions

Yao Li, Minhao Cheng, Kevin Fujii, Fushing Hsieh, Cho-Jui Hsieh

We study the problem of learning from group comparisons, with applications in predicting outcomes of sports and online games. Most of the previous works in this area focus on learning individual effects--they assume each player has an underlying score, and the "ability" of the team is modeled by the sum of team

artificial intelligence, interaction, machine learning, (18 more...)

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > California > Yolo County > Davis (0.05)
Asia > Middle East > Jordan (0.04)
North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.47)

Industry: Leisure & Entertainment > Games > Computer Games (0.89)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceOct-21-2025

Efficient Portfolio Selection through Preference Aggregation with Quicksort and the Bradley--Terry Model

Ge, Yurun, Böttcher, Lucas, Chou, Tom, D'Orsogna, Maria R.

How to allocate limited resources to projects that will yield the greatest long-term benefits is a problem that often arises in decision-making under uncertainty. For example, organizations may need to evaluate and select innovation projects with risky returns. Similarly, when allocating resources to research projects, funding agencies are tasked with identifying the most promising proposals based on idiosyncratic criteria. Finally, in participatory budgeting, a local community may need to select a subset of public projects to fund. Regardless of context, agents must estimate the uncertain values of a potentially large number of projects. Developing parsimonious methods to compare these projects, and aggregating agent evaluations so that the overall benefit is maximized, are critical in assembling the best project portfolio. Unlike in standard sorting algorithms, evaluating projects on the basis of uncertain long-term benefits introduces additional complexities. We propose comparison rules based on Quicksort and the Bradley--Terry model, which connects rankings to pairwise "win" probabilities. In our model, each agent determines win probabilities of a pair of projects based on his or her specific evaluation of the projects' long-term benefit. The win probabilities are then appropriately aggregated and used to rank projects. Several of the methods we propose perform better than the two most effective aggregation methods currently available. Additionally, our methods can be combined with sampling techniques to significantly reduce the number of pairwise comparisons. We also discuss how the Bradley--Terry portfolio selection approach can be implemented in practice.

artificial intelligence, pairwise comparison, win probability, (17 more...)

doi: 10.1016/j.jocs.2025.102728

2504.16093

Country:

Europe (1.00)
North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (1.00)

Industry:

Government (0.68)
Leisure & Entertainment > Games (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Aczel, Till, Theis, Lucas, Roger, Wattenhofer

Efficient Bayesian Inference from Noisy Pairwise Comparisons

arXiv.org Artificial IntelligenceOct-13-2025

Evaluating generative models is challenging because standard metrics often fail to reflect human preferences. Human evaluations are more reliable but costly and noisy, as participants vary in expertise, attention, and diligence. Pairwise comparisons improve consistency, yet aggregating them into overall quality scores requires careful modeling. Bradley-Terry-based methods update item scores from comparisons, but existing approaches either ignore rater variability or lack convergence guarantees, limiting robustness and interpretability. We introduce BBQ, a Bayesian Bradley-Terry variant that explicitly models rater quality, downweighting or removing unreliable participants, and provides guaranteed monotonic likelihood convergence through an Expectation-Maximization algorithm. Empirical results show that BBQ achieves faster convergence, well-calibrated uncertainty estimates, and more robust, interpretable rankings compared to baseline Bradley-Terry models, even with noisy or crowdsourced raters. This framework enables more reliable and cost-effective human evaluation of generative models.

artificial intelligence, machine learning, natural language, (21 more...)

2510.09333

Country: Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.95)

Mikkola, Petrus, Acerbi, Luigi, Klami, Arto

Score-Based Density Estimation from Pairwise Comparisons

arXiv.org Artificial IntelligenceOct-13-2025

We study density estimation from pairwise comparisons, motivated by expert knowledge elicitation and learning from human feedback. We relate the unobserved target density to a tempered winner density (marginal density of preferred choices), learning the winner's score via score-matching. This allows estimating the target by `de-tempering' the estimated winner density's score. We prove that the score vectors of the belief and the winner density are collinear, linked by a position-dependent tempering field. We give analytical formulas for this field and propose an estimator for it under the Bradley-Terry model. Using a diffusion model trained on tempered samples generated via score-scaled annealed Langevin dynamics, we can learn complex multivariate belief densities of simulated experts, from only hundreds to thousands of pairwise comparisons.

artificial intelligence, experiment, machine learning, (15 more...)

2510.09146

Country: Europe > Finland (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Neural Information Processing SystemsOct-9-2025, 04:01:13 GMT

Author Contributions

A.1 Deriving the Optimum of the KL-Constrained Reward Maximization Objective In this appendix, we will derive Eq. 4. Analogously to Eq. 3, we optimize the following objective: max

artificial intelligence, deep learning, machine learning, (17 more...)